AITopics | semantic token prediction

Collaborating Authors

semantic token prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

Kim, Minchan, Jeong, Myeonghun, Lee, Joun Yeop, Kim, Nam Soo

arXiv.org Artificial IntelligenceOct-6-2024

It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.

arxiv preprint arxiv, implicit neural representation, seginr, (11 more...)

arXiv.org Artificial Intelligence

2410.0469

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.41)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.67)

Add feedback

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Kim, Minchan, Jeong, Myeonghun, Choi, Byoung Jin, Lee, Dongjune, Kim, Nam Soo

arXiv.org Artificial IntelligenceNov-8-2023

We introduce a text-to-speech(TTS) framework based on a neural transducer. We use discretized semantic tokens acquired from wav2vec2.0 embeddings, which makes it easy to adopt a neural transducer for the TTS framework enjoying its monotonic alignment constraints. The proposed model first generates aligned semantic tokens using the neural transducer, then synthesizes a speech sample from the semantic tokens using a non-autoregressive(NAR) speech generator. This decoupled framework alleviates the training complexity of TTS and allows each stage to focus on 1) linguistic and alignment modeling and 2) fine-grained acoustic modeling, respectively. Experimental results on the zero-shot adaptive TTS show that the proposed model exceeds the baselines in speech quality and speaker similarity via objective and subjective measures. We also investigate the inference speed and prosody controllability of our proposed model, showing the potential of the neural transducer for TTS frameworks.

neural transducer, semantic token prediction, transduce and speak, (1 more...)

arXiv.org Artificial Intelligence

2311.02898

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.60)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.60)
Information Technology > Artificial Intelligence > Assistive Technologies (0.60)

Add feedback